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The  testing  of  optimization  algorithms  requires  the  running  of 
problems  with  ill-conditioned  Hessians.  For  constrained  problems, 
it  is  the  projection  of  the  Hessian  onto  the  space  determined  by 
the  active  constraints  that  must  be  ill  conditioned.  In  this  note 
it  is  argued  that  unless  the  Hessian  and  the  constraints  are 
constructed  together,  the  constrained  Hessian  is  likely  to  be 
well  conditioned.  The  approach  is  to  examine  the  effects  of 
random  constraints  on  a  singular  Hessian 
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Constrained  Definite  Hessians 
Tend  to  be  Uell  Conditioned 

G.  W.  Stewart 


In  testing  and  comparing  optimization  algorithms ,  it  is  important 
to  include  test  problems  for  which  the  Hessian  matrix  of  the  objective 
function  is  ill  conditioned,  both  at  the  optimum  point  and  away  from 
it.  For  unconstrained  optimization  this  is  easy  enough  to  do,  and  prob¬ 
lems  with  ill-conditioned  or  even  singular  Hessians  appear  frequently 
in  the  literature. 

For  constrained  optimization  problems,  however,  the  operative  con¬ 
dition  number  is  usually  that  of  the  projection  of  the  Hessian  onto  the 
space  of  active  constraints  (or  the  tangent  space  in  the  case  of  non¬ 
linear  constraints) .  It  is  the  purpose  of  this  note  to  show  that  such 
a  projection  will  tend  to  be  well  conditioned,  even  when  the  underlying 
Hessian  is  singular. 

It  is  easy  to  see  that  projection  can  only  improve  the  condition  of 
a  definite  matrix.  Specifically,  for  a  positive  definite  matrix  H  of 
order  n,  define  the  condition  number  k (H)  by 


(1) 


«(H)  -  X  /X 

D8X 


min 


where  X  and  X  .  are  the  largest  and  smallest  eigenvalues  of  H 
max  min 

(n.b.  this  definition  is  appropriate  only  for  positive  definite  matrices; 
it  does  not  generalize  to  indefinite  or  nonsymnetrlc  matrices).  A  set  of 


constraints  may  be  specified  by  a  set  {q^,  q^,  ..., 


qp}  of  independent 
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vectors,  which,  without  loss  of  generality,  may  be  taken  to  be  ortho- 

normal.  The  constraint  space  is  then  the  orthogonal  complement  of  the 

space  spanned  by  q,,q_,...,q  .  Thus  if  we  set 
L  z  p 


Q1  *  ^1*^2*  *  * '  ,qp^ 


and  determine  an  orthogonal  matrix 


Q  -  (Qj_  Q2) 


whose  first  p  columns  are  the  vectors  q, ,q.  . . . ,q  ,  then  the  con- 

X  z ,  p 

straint  space  will  be  spanned  by  the  columns  of  Q^. 

The  constrained  Hessian  is 


<3)  “c  •  ^ 

I  r 

It  follows  from  standard  results  of  matrix  theory  [2]  that  /  Tab  ' 

i 

A  (H  >  *  Xmax(H)  - 

max  c  max  t  gv  — 

and 


Hence  from  (1) 


fast 


WV  4  WH>  • 
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Moreover,  equality  will  be  attained  only  if  the  constraint  space  contains 
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eigenvectors  of  H  corresponding  to  X  (H)  and  X  .  (H).  This  suggests 

max  min 

that  unless  is  specially  chosen,  kCH,)  can  be  appreciably  smaller 
than  k (H) .  The  rest  of  this  paper  is  an  attempt  to  give  some  quantitative 
substance  to  this  conjecture  by  examining  the  behavior  of  tc  (H^)  when 
the  constraint  space  is  chosen  at  random. 

For  definiteness  we  shall  consider  the  singular  matrix 

1  i  0 

n-1 

(4)  H  - 

°  0 

where  1^  ^  is  the  identity  matrix  of  order  n-1.  He  shall  determine  a 
randomly  constrained  Hessian  by  choosing  a  random  orthogonal  matrix  Q 
from  the  Haar  distribution  on  the  group  of  orthogonal  matrices  [1],  par¬ 
titioning  Q  as  in  (2) ,  and  defining  H£  by  (3) .  The  distribution 
from  which  Q  is  chosen  is  analogous  to  the  uniform  distribution.  Com¬ 
putationally  such  a  Q  may  be  obtained  by  orthogonalizing  a  set  of  n 
vectors  whose  components  are  identically  distributed,  independent,  normal 
random  variables  [3].  In  particular,  any  row  or  column  of  such  a  matrix 
has  the  distribution  of  a  normalized  vector  of  identically  distributed. 
Independent,  random  normal  variables. 

The  principal  result  is  contained  in  the  following  theorem. 

Theorem.  Let  Q  be  a  random  orthogonal  matrix  from  the  Haar  distri¬ 
bution  on  the  group  of  orthogonal  matrices.  Let  Q  be  partitioned  as  in 
(2),  where  has  p  columns.  Let  H  be  defined  by  (4)  and 
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H£  by  (3).  If  p<n-2,  then 

(5)  <(Hc)  -  1  +  f 

where  F  has  an  F-distribution  with  n-p  and  p  degrees  of  freedom. 


Proof.  Since  Q2  has  at  least  two  colums. 


X  (H  ) 
max  c 


1  . 


Thus  the  problem  becomes  that  of  determining  the  distribution  of  • 

We  shall  use  the  characterization 
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T 

min  x  H  x  , 

II  x  |-1  C 


where  ||  *  ||  denotes  the  usual  Euclidean  norm. 


be  partitioned  in  the 

form 

P 

n-p 

[QU 

Q  - 

T 

T 

q21 

q22 

T 

Then  H  *  Q.-Q,-  .  Hence 
c  u  u 


X  .  (H  ) 
min  c 


min 

11*11-1 


T  T 

x  Q12Q12x 
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•  min  1  -  (q,,x)  , 

II *  M  21 


(6) 
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the  last  Inequality  following  from  the  fact  that  the  vector 

Q12 

x 
T 

q22 

haa  norm  one.  The  last  expression  in  (6)  is  clearly  minimized 
when  x  -  q22/  |q22  ^ *  in  which  caae 

^min^c)  “  1  '  lq22  l|2  "  Bq21  ^  ’ 

M  M  A 

since  J(q21,  q22>  J  •  1. 

Let  y  be  an  vector  of  Identically  distributed,  independent 

normal  random  variables,  and  partition  yT  ■  (y*,  y2>,  where  y^ 

is  a  p-vector.  Then  by  the  observations  made  before  the  theorem, 

X.  (H  )  has  the  distribution  of 
nnn  c 

II?!  I2  i?!  I2 _ 

ITF  '  I?!  I2  ♦  b2  I2 

Thus  <(H  )  •  1/X  ,  (H  )  has  the  distribution  of 
c  min  c 

lyi<2+  b2)2  _  i+a=E  pI?2I2 
I?!  I2  ’  (»f)|?l|2 


a  i  +  £l£  p  , 
P 
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where  F  has  an  F-distribution  with  n-p  and  p  degrees  of 
freedom. 

Note.  The  proof  of  the  theorem  can  easily  be  extended  to  cover 

the  case  H  •  diag(X, ,X_,. . . ,X  ,,0)  where  X,  >  X.  >  ...  >  X  .  >  0. 
c  l  i  n-i  i  i  n-i 

For  this  matrix 

xi 

ie(H  )  <  r — ”  1  +  F 
c  Xn-1  p 

where  again  F  has  an  F-distribution  with  p  and  n-p  degrees  of 
freedom.  We  do  not  pursue  this  embellishment  here  because  the  simpler 
case  (4)  adequately  illustrates  how  likely  projection  is  to  produce 
a  well-conditioned  matrix. 

The  well  known  properties  of  the  F  distribution  along  with  (5) 
can  be  used  to  determine  what  is  a  probable  value  of  the  condition  number 
of  H^.  Table  1  gives  values  of  such  that  for  n  >  p+2 

(7)  P{k(H  )  <  1  +  (n-p)y  }  >  0.99  . 

c  p 

Thus  for  p  ■  3,  we  shall  observe  <(Hc>  i  1  +  9.4(n-3)  at  least  ninety 
nine  percent  of  the  time,  and  n  will  have  to  be  very  large  indeed  to 
produce  the  degree  of  ill-conditioning  that  would  seriously  disconmode  a 
well  constructed  algorithm. 

It  would  be  wrong  to  conclude  from  this  analysis  that  ill-conditioned 
constrained  Hessians  do  not  occur  in  practical  problems.  Nature  has  a  way 


Table  1 


u  from  (  7  ) 
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3 

4 
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6 

7 
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10 


6370 

49.8 

9.40 

3.80 

2.10 

1.35 

0.960 

0.726 

0.576 

0.471 


of  confounding  naive  randomness  assumptions  by  behaving  in  a  distinctly 
nonrandom  and  frequently  perverse  manner.  However,  to  the  extent  that 
ill-conditioned  constrained  Hessians  occur  in  practice,  to  that  extent 
there  is  a  need  for  test  problems  with  such  Hessians;  and  the  above  analysis 
has  implications  for  the  construction  of  these  problems.  Namely,  it  is 
not  enough  to  choose  an  objective  function  with  an  ill-conditioned  Hessian 
and  hope  that  unsystematically  chosen  constraints  will  preserve  the  ill- 
conditioning;  rather  the  constraints  and  the  Hessian  must  be  constructed 
together. 
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